scalar_ge ========================= .. _scalarge-label: *G-E interaction analysis via deep leanring when the input X is scalar.* Description ------------ This function provides an approach based on neural network in conjunction with MCP and L :subscript:`2` penalizations which can simultaneously conduct model estimation and selection of important main G effects and G–E interactions, while uniquely respecting the "main effects, interactions" variable selection hierarchy. See also at :ref:`sim_data_scalar ` and :ref:`grid_scalar_ge `. The model is :ref:`ScalarGE `. Usage ------ .. code-block:: python scalar_ge(y, G, E, ytype, num_hidden_layers, nodes_hidden_layer, num_epochs, learning_rate1, learning_rate2, lambda1 = None, lambda2 = None, Lambda = None, threshold = None, split_type = 0, ratio = [7, 3], important_feature = True, plot = True) Parameters ---------- This part shows the meanings and data types of parameters. Users can check the table below to build a customizable ScalarGE model. .. list-table:: :widths: 30 70 :header-rows: 1 :align: center * - Parameter - Description * - **y** - array or dataframe, the response variable. * - **G** - array or dataframe, the scalar genetic variable. * - **E** - array or dataframe, the scalar environmental variable. * - **ytype** - character, "Survival", "Binary" or "Continuous" type of the output y. * - **num_hidden_layers** - numeric, number of hidden layers in the neural network. * - **nodes_hidden_layer** - list, contains number of nodes in each hidden layer. * - **num_epochs** - numeric, number of epochs for neural network training. * - **learning_rate1** - numeric, learning rate of sparse layers. * - **learning_rate2** - numeric, learning rate of hidden layers. * - **lambda1** - numeric or None, tuning parameter of the first MCP penalization. * - **lambda2** - numeric, tuning parameter of the second MCP penalization. * - **Lambda** - numeric, tuning parameter of L2 penalization. * - **threshold** - numeric, threshold in the selection of important features. * - **split_type** - integer, types of data split. If split_type = 0, the data is divided into a training set and a validation set. If split_type = 1, the data is divided into a training set, a validation set and a test set. * - **ratio** - list, the ratio of data split. * - **important_feature** - bool, "True" or "False", whether or not to show output features. * - **plot** - bool, "True" or "False", whether or not to show the line plot of residuals with the number of neural network epochs. Value ------- The function **scalar_ge** outputs a tuple including training results of the ScalarGE model: - Residual of the training set. - Residual of the validation set. - C index (y is survival) or R2 (y is continuous or binary) of the training set. - C index (y is survival) or R2 (y is continuous or binary) of the validation set. - A neural network after training. - Important features of gene variables. - Important features of G-E interaction variables. Here is an example output for an established model: .. image:: /_static/scalar_ge.png :width: 700 :align: center In terms of visualization, this function can output the line plot of residuals with the number of neural network epochs. Here is an example output: .. image:: /_static/scalar_ge_train.png :width: 500 :align: center Examples ------------- Here is a quick example for using this function: .. code-block:: python from GENetLib.sim_data import sim_data_scalar from GENetLib.scalar_ge import scalar_ge ytype = 'Survival' num_hidden_layers = 2 nodes_hidden_layer = [1000, 100] learning_rate2 = 0.015 Lambda = 0.2 learning_rate1 = 0.09 lambda2 = 0.09 num_epochs = 100 scalar_survival_linear = sim_data_scalar(rho_G = 0.25, rho_E = 0.3, dim_G = 500, dim_E = 5, n = 1500, dim_E_Sparse = 2, ytype = ytype, n_inter = 30) y = scalar_survival_linear['y'] G = scalar_survival_linear['G'] E = scalar_survival_linear['E'] scalar_ge_res = scalar_ge(y, G, E, ytype, num_hidden_layers, nodes_hidden_layer, num_epochs, learning_rate1, learning_rate2, lambda1 = None, lambda2 = lambda2, Lambda = Lambda)